智能论文笔记

TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data

Beichen Zhang , Frank Schilder , Kelly Helm Smith , Michael J. Hayes , Sherri Harms , Tsegaye Tadesse

分类：自然语言处理 | 机器学习

2022-12-07

Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.

translated by 谷歌翻译

Quantitative Assessment of Drought Impacts Using XGBoost based on the Drought Impact Reporter

Beichen Zhang , Fatima K. Abu Salem , Michael J. Hayes , Tsegaye Tadesse

分类：机器学习

2022-11-04

Under climate change, the increasing frequency, intensity, and spatial extent of drought events lead to higher socio-economic costs. However, the relationships between the hydro-meteorological indicators and drought impacts are not identified well yet because of the complexity and data scarcity. In this paper, we proposed a framework based on the extreme gradient model (XGBoost) for Texas to predict multi-category drought impacts and connected a typical drought indicator, Standardized Precipitation Index (SPI), to the text-based impacts from the Drought Impact Reporter (DIR). The preliminary results of this study showed an outstanding performance of the well-trained models to assess drought impacts on agriculture, fire, society & public health, plants & wildlife, as well as relief, response & restrictions in Texas. It also provided a possibility to appraise drought impacts using hydro-meteorological indicators with the proposed framework in the United States, which could help drought risk management by giving additional information and improving the updating frequency of drought impacts. Our interpretation results using the Shapley additive explanation (SHAP) interpretability technique revealed that the rules guiding the predictions of XGBoost comply with domain expertise knowledge around the role that SPI indicators play around drought impacts.

translated by 谷歌翻译

BON: An extended public domain dataset for human activity recognition

Girmaw Abebe Tadesse , Oliver Bent , Komminist Weldemariam , Md. Abrar Istiak , Taufiq Hasan , Andrea Cavallaro

分类：计算机视觉

2022-09-12

人体戴的第一人称视觉（FPV）摄像头使从受试者的角度提取有关环境的丰富信息来源。然而，与其他活动环境（例如厨房和室外卧床）相比，基于可穿戴摄像头的eg中心办公室活动的研究进展速度很慢，这主要是由于缺乏足够的数据集来培训更复杂的（例如，深度学习）模型的模型在办公环境中的人类活动识别。本文提供了使用胸部安装的GoPro Hero摄像机，提供了三个地理位置的不同办公室设置中收集的大型公开办公活动数据集（BON）：巴塞罗那（西班牙），牛津（英国）和内罗毕（肯尼亚）。 BON数据集包含十八个常见的办公活动，可以将其分为人与人之间的互动（例如与同事聊天），人对象（例如，在白板上写作）和本体感受（例如，步行）。为5秒钟的视频段提供注释。通常，BON包含25个受试者和2639个分段。为了促进子域中的进一步研究，我们还提供了可以用作未来研究基准的结果。

translated by 谷歌翻译

Multimodal Feature Extraction for Memes Sentiment Classification

Sofiane Ouaari , Tsegaye Misikir Tashu , Tomas Horvath

分类：人工智能

2022-07-07

在这项研究中，我们提出了使用深度学习方法进行多模式模因分类的特征提取。模因通常是一张照片或视频，其中年轻一代在社交媒体平台上共享文本，表达了与文化相关的想法。由于它们是表达情感和感受的有效方法，因此可以对模因背后的情绪进行分类的好分类器很重要。为了使学习过程更有效，请减少过度拟合的可能性，并提高模型的普遍性，需要一种良好的方法来从所有模式中提取共同特征。在这项工作中，我们建议使用不同的多模式神经网络方法进行多模式特征提取，并使用提取的功能来训练分类器以识别模因中的情感。

translated by 谷歌翻译

Multimodal E-Commerce Product Classification Using Hierarchical Fusion

Tsegaye Misikir Tashu , Sara Fattouh , Peter Kiss , Tomas Horvath

分类：人工智能

2022-07-07

在这项工作中，我们提出了用于商业产品分类的多模式模型，该模型结合了使用简单的融合技术从Textual（Camembert和Flaubert）和视觉数据（SE-Resnext-50）中提取的功能。所提出的方法显着优于单峰模型的性能以及在我们的特定任务上报告的类似模型的报告。我们进行了多种融合技术的实验，并发现，结合单峰网络的单个嵌入的最佳性能技术是基于结合串联和平均特征向量的方法。每种模式都补充了其他方式的缺点，表明增加模态的数量可能是改善多标签和多模式分类问题的有效方法。

translated by 谷歌翻译

Deep Learning Architecture for Automatic Essay Scoring

Tsegaye Misikir Tashu , Chandresh Kumar Maurya , Tomas Horvath

分类：自然语言处理 | 人工智能

2022-06-16

由于在线学习和评估平台（例如Coursera，Udemy，Khan Academy等）的兴起，对论文（AES）和自动论文评分的自动评估（AES）已成为一个严重的问题。研究人员最近提出了许多用于自动评估的技术。但是，其中许多技术都使用手工制作的功能，因此从特征表示的角度受到限制。深度学习已成为机器学习中的新范式，可以利用大量数据并确定对论文评估有用的功能。为此，我们提出了一种基于复发网络（RNN）和卷积神经网络（CNN）的新型体系结构。在拟议的体系结构中，多通道卷积层从嵌入矢量和基本语义概念中学习并捕获单词n-gram的上下文特征，并使用max-pooling操作在论文级别形成特征向量。 RNN的变体称为双门复发单元（BGRU），用于访问以前和后续的上下文表示。该实验是对Kaggle上的八个数据集进行的，以实现AES的任务。实验结果表明，我们提出的系统比其他基于深度学习的AES系统以及其他最新AES系统的评分精度明显更高。

translated by 谷歌翻译

A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture

Mohsen Jafarzadeh , Stephen Brooks , Shimeng Yu , Balakrishnan Prabhakaran , Yonas Tadesse

分类：机器人 | 人工智能

2022-01-06

目前，大多数社会机器人通过传感器与周围环境和人类相互作用，这些传感器是机器人的组成部分，这限制了传感器，人机相互作用和互换性的可用性。在许多应用中需要一种适合许多机器人的可穿戴传感器衣服。本文介绍了一个经济实惠的可穿戴传感器背心，以及带有物联网（物联网）的开源软件架构，用于社会人形机器人。背心由触摸，温度，手势，距离，视觉传感器和无线通信模块组成。 IOT功能允许机器人与人类和互联网一起与人类交互。设计的体系结构适用于任何具有通用图形处理单元（GPGPU），I2C / SPI总线，Internet连接和机器人操作系统（ROS）的任何社交机器人。此架构的模块化设计使开发人员能够轻松地添加/删除/更新复杂行为。所提出的软件架构提供IOT技术，GPGPU节点，I2C和SPI总线管理器，视听交互节点（语音到文本，文本到语音和图像理解），以及行为节点和其他节点之间的隔离。所提出的IOT解决方案包括机器人中的相关节点，RESTful Web服务和用户界面。我们使用HTTP协议作为与Internet的社会机器人双向通信的手段。开发人员可以在C，C ++和Python编程语言中轻松编辑或添加节点。我们的架构可用于为社会人形机器人设计更复杂的行为。

translated by 谷歌翻译

Sparsity-based Feature Selection for Anomalous Subgroup Discovery

Girmaw Abebe Tadesse , William Ogallo , Catherine Wanjiru , Charles Wachira , Isaiah Onando Mulang' , Vibha Anand , Aisha Walcott-Bryant , Skyler Speakman

分类：机器学习 | 人工智能

2022-01-06

异常模式检测旨在识别与正常偏差明显的情况，并且广泛适用于域。在现有技术中提出了多种异常的检测技术。但是，有一个常见的原则和可扩展的特征选择方法，以便有效发现。通常通过优化预测结果的性能而不是与预期的系统偏差来实现现有的特征选择技术。在本文中，我们提出了一种基于稀疏的自动特征选择（SAFS）框架，其通过特征驱动的大量比率的稀疏性编码系统的结果偏差。 SAF是一种模型 - 无可争议的方法，具有不同发现技术的可用性。 SAF在可在公开的关键护理数据集上验证时维持检测性能超过3倍，计算时间超过3美元。与特征选择的多个基线相比，SAF也会导致卓越的性能。

translated by 谷歌翻译

Automated Supervised Feature Selection for Differentiated Patterns of Care

Catherine Wanjiru , William Ogallo , Girmaw Abebe Tadesse , Charles Wachira , Isaiah Onando Mulang' , Aisha Walcott-Bryant

分类：机器学习 | 人工智能

2021-11-05

使用多种最先进的特征选择技术开发了自动特征选择管道，以选择用于区分护理模式（DPOC）的最佳功能。管道包括三种类型的特征选择技术;过滤器，包装器和嵌入式方法选择顶部K功能。使用具有二进制依赖变量的五种不同的数据集，选择了它们的不同顶部K最佳功能。在现有的多维子集扫描（MDS）中测试了所选特征，其中记录了最异常的亚步骤，大多数异常子集，倾向分数和测量的效果以测试它们的性能。将这种性能与在MDSS管道中数据集中的所有协变量中获得的四个类似的指标进行了比较。我们发现，尽管使用了不同的特征选择技术，但数据分布是在确定要使用的技术时注意的键。

translated by 谷歌翻译